Assignment

Assignment as part of Visual Analytics Project titled VISTAS - Visualising Industry Skill TAlent Shifts.

Louis Chong Jia Jun (louis.chong.2019@mitb.smu.edu.sg) https://www.linkedin.com/in/louis-chong-jia-jun
04-03-2021

1.0 Overview

1.1 Background

The LinkedIn and World Bank Group have partnered and released data from 2015 to 2019 that focuses on 100+ countries with at least 100,000 LinkedIn members each, distributed across 148 industries and 50,000 skill categories. This data aims to help government and researchers understand rapidly evolving labor markets with detailed and dynamic data (The World Bank Group, 2021).

1.2 Data

The proposed datasets for analyses will mainly come from the LinkedIn-World Bank partnership. This will be complemented with GDP per capita from World Bank.

In this assignment, the following datasets from LinkedIn-World Bank will be used:

The GDP per capita dataset, which comprises annual time-series GDP per capita (current USD) for countries, is also used.

1.3 Critiques of Existing Data Visualisations

Simple data visualisations are publicly available at LinkedIn-World Bank webpage. However, they are limited in the exploration of trends because they only allow one parameter of choice.

In terms of how variables relate to one another, there are limitations to studies done so far. Studies on relationship between GDP per capita growth rate and population growth are usually done at country level, so more studies can be done to analyse the relationship between GDP per capita growth rate and employment growth or migration in each industry. Figure 1 shows the relationship between employment growth and interstate migration (Tunny, G., 2015). Here, analysis is done at state level, but not industry level.

Figure 1: Employment Growth vs Interstate Migration

Figure 2 shows the relationship between GDP per capita and share of employment in business services in 2000 for countries in Europe Kox, H. & Rubalcaba, L., 2007. However, this does not show how a change in migration or employment growth in an industry will cause the change in GDP per capita.

Figure 2: GDP per capita vs Share of employment in business services in Europe in 2000

Overall, these data visualisations mentioned are static and do not allow readers to find out how the change in GDP per capita is influenced by the change in employment and migration in each industry or skill and how the change in employment in influenced by change in migration in each industry or skill.

1.4 Motivation

Through this project, we will create a user-friendly, interactive dashboard that allows individuals and countries to view their competitive advantage and understand the evolving labour markets across the world in four areas: skills, occupations, migration and industries. Comparison will be allowed at both country-level and industry-level. The project will help answer the following questions:

For individuals:

For countries:

This assignment will build regression and correlation plots to answer questions 1A, 1B and 2E.

1.5 Sketch of Data Visualisation

Figure 3 shows the sketch of the proposed data visualisation.

Figure 3: Sketch of Proposed Data Visualisation

1.6 Advantages and Components of Data Visualisation

For this assignment, the proposed data visualisation is an interactive dashboard with regression and correlation plots. It can be filtered according to country, region, year, income level, industry section, industry, skill category and skill to show different plots. Users can choose two parameters and find out the relationship between GDP per capita growth and industry employment growth/industry migration/skill migration, or industry employment growth and industry/skill migration. Correlation of GDP per capita growth, industry employment growth, industry migration and skill migration can also be determined.

The following packages are used to build the respective plots:

Unlike R Shiny, we are unable to have an interactive filter on R Markdown that can filter the dataset and directly change the regression and correlation plots. Crosstalk, d3scatter and lazyeval packages allow interactive filter on R Markdown, but are only compatible with data tables and scatter plots. Thus, in this assignment, we will first use crosstalk, d3scatter and lazyeval packages to filter and explore the distributions of variables on scatter plots, before we manually adjust the code and filter the dataset, which will change the regression and correlation plots.

2.0 Step-by-Step Data Visualisation Preparation

2.1 Installing and Launching R Packages

First, we will install and launch the relevant R packages.

If d3scatter package has not been installed before, we can run the following two commands.

packages = c('devtools')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}
devtools::install_github("jcheng5/d3scatter")

If d3scatter package has already been installed, we can run the following command to install and launch all relevant R packages. In addition to the packages mentioned in the earlier sections, the readxl package is required to read xlsx files and the tidyverse package is needed to import data (read_csv), do data cleaning and manipulation (dplyr, tidyr). The olsrr package is also used to find out detailed results of the regression plots for further discussion in this assignment.

packages = c('readxl', 'devtools', 'crosstalk', 'd3scatter',
             'lazyeval', 'tidyverse', 'ggstatsplot', 'olsrr')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Load Datasets onto R

Next, we will load the necessary datasets for this assignment, which include GDP per capita, industry employment growth, industry migration, skill migration and industry skills need.

industryEmploymentGrowth <- read_excel("data/public_use-industry-employment-growth.xlsx",
                                       sheet = "Growth from Industry Transition")

industrySkillsNeeds <- read_excel("data/public_use-industry-skills-needs.xlsx",
                                  sheet = "Industry Skills Needs")

industryMigration <- read_excel("data/public_use-talent-migration.xlsx",
                                sheet = "Industry Migration")

skillMigration <- read_excel("data/public_use-talent-migration.xlsx",
                             sheet = "Skill Migration")

gdpPerCapita <- read_csv("data/API_NY.GDP.PCAP.CD_DS2_en_csv_v2_2055804.csv")

2.3 Data Wrangling

After loading the datasets, we will do the necessary data cleaning and manipulation.

The GDP per capita dataset contains GDP per capita for various countries over many years. First, we change the column name from “Country Name” to “country_name” to be consistent with other datasets. We also change the column names e.g. from “2019” to “gpc2019”, as the numerical labelling may result in difficulties in data wrangling. Only relevant columns are kept i.e. country, GDP per capita from 2014 to 2019. Next, to determine the change in GDP per capita year on year for 2015 to 2019, we apply the formulas and keep only relevant columns i.e. country, GDP per capita growth for 2015 to 2019. Lastly, we use pivot_longer to place GDP per capita growth in rows instead of columns. The name of the column storing the original values of the columns is titled “GDP_per_capita_growth” and the name of the column storing the original names of the columns is titled “year”. We also edit to keep only values of the year i.e. from “gpc2019” to “2019”.

names(gdpPerCapita)[1] <- "country_name"

names(gdpPerCapita)[59] <- "gpc2014"
names(gdpPerCapita)[60] <- "gpc2015"
names(gdpPerCapita)[61] <- "gpc2016"
names(gdpPerCapita)[62] <- "gpc2017"
names(gdpPerCapita)[63] <- "gpc2018"
names(gdpPerCapita)[64] <- "gpc2019"

gdpPerCapita <- gdpPerCapita[-c(2:58, 65)]

gdpPerCapita$g2015 <- (gdpPerCapita$gpc2015/gdpPerCapita$gpc2014) - 1
gdpPerCapita$g2016 <- (gdpPerCapita$gpc2016/gdpPerCapita$gpc2015) - 1
gdpPerCapita$g2017 <- (gdpPerCapita$gpc2017/gdpPerCapita$gpc2016) - 1
gdpPerCapita$g2018 <- (gdpPerCapita$gpc2018/gdpPerCapita$gpc2017) - 1
gdpPerCapita$g2019 <- (gdpPerCapita$gpc2019/gdpPerCapita$gpc2018) - 1

gdpPerCapita <- gdpPerCapita[-c(2:7)] %>%
  pivot_longer(cols = c(`g2015`, `g2016`, `g2017`, `g2018`, `g2019`),
               names_to = "year",
               values_to = "GDP_per_capita_growth") %>%
  mutate(year = str_sub(year, 2, -1))

The industry employment growth dataset contains employment growth rate for various industries in various countries from 2015 to 2019. We will keep only relevant columns i.e. country, region, income, industry, employment growth for 2015 to 2019. Next, we use pivot_longer to place industry employment growth in rows instead of columns. The name of the column storing the original values of the columns is titled “employment_growth” and the name of the column storing the original names of the columns is titled “year”. We also edit to keep only values of the year i.e. from “growth_rate_2019” to “2019”.

industryEmploymentGrowth <- industryEmploymentGrowth[-c(1, 5, 7)] %>%
  pivot_longer(cols = c(`growth_rate_2015`, `growth_rate_2016`, `growth_rate_2017`,
                        `growth_rate_2018`, `growth_rate_2019`),
               names_to = "year",
               values_to = "employment_growth") %>%
  mutate(year = str_sub(year, 13, -1))

The industry migration dataset contains migration data for various industries in various countries from 2015 to 2019. We will keep only relevant columns i.e. country, region, income, industry, migration for 2015 to 2019. Next, we use pivot_longer to place industry migration in rows instead of columns. The name of the column storing the original values of the columns is titled “industry_migration” and the name of the column storing the original names of the columns is titled “year”. We also edit to keep only values of the year i.e. from “net_per_10k_2019” to “2019”. Lastly, we will normalise industry migration by dividing by 1000 i.e. industry migration (net per 10) to have a common scale with GDP per capita growth and industry employment growth.

industryMigration <- industryMigration[-c(1, 5, 7)] %>%
  pivot_longer(cols = c(`net_per_10K_2015`, `net_per_10K_2016`, `net_per_10K_2017`,
                        `net_per_10K_2018`, `net_per_10K_2019`),
               names_to = "year",
               values_to = "industry_migration") %>%
  mutate(year = str_sub(year, 13, -1)) %>%
  mutate(industry_migration = industry_migration / 1000)

The skill migration dataset contains migration data for various skills in various countries from 2015 to 2019. We will keep only relevant columns i.e. country, region, income, skill, skill migration for 2015 to 2019. Next, we use pivot_longer to place skill migration in rows instead of columns. The name of the column storing the original values of the columns is titled “skill_migration” and the name of the column storing the original names of the columns is titled “year”. We also edit to keep only values of the year i.e. from “net_per_10k_2019” to “2019”. Lastly, we will normalise skill migration by dividing by 1000 i.e. skill migration (net per 10) to have a common scale with GDP per capita growth and industry employment growth.

skillMigration <- skillMigration[-c(1, 5)] %>%
  pivot_longer(cols = c(`net_per_10K_2015`, `net_per_10K_2016`, `net_per_10K_2017`,
                        `net_per_10K_2018`, `net_per_10K_2019`),
               names_to = "year",
               values_to = "skill_migration") %>%
  mutate(year = str_sub(year, 13, -1)) %>%
  mutate(skill_migration = skill_migration / 1000)

The industry skills need dataset contains the top 10 skill needs for each industry from 2015 to 2019. We will merge skill migration and industry skill need datasets to form the industrySkillMigration data table and show the industries which the skills are needed in.

industrySkillsNeeds <- industrySkillsNeeds[-c(2, 7)]
industrySkillMigration <- merge(skillMigration, industrySkillsNeeds,
                                by = c("year", "skill_group_category",
                                       "skill_group_name"))

Various datasets are then merged to have various visualisations of regression and correlation. They are as follows.

master1 <- merge(industryEmploymentGrowth, gdpPerCapita,
                 by = c("country_name", "year"))
master2 <- merge(industryMigration, gdpPerCapita,
                 by = c("country_name", "year"))
master3 <- merge(skillMigration, gdpPerCapita,
                 by = c("country_name", "year"))
master4 <- merge(industryEmploymentGrowth, industryMigration,
                 by = c("country_name", "year", "wb_region", "wb_income",
                        "isic_section_name", "industry_name"))
master5 <- merge(industryEmploymentGrowth, industrySkillMigration,
                 by = c("country_name", "year", "wb_region", "wb_income",
                        "isic_section_name", "industry_name"))
master6 <- merge(master5, master2,
                 by = c("country_name", "year", "wb_region", "wb_income",
                        "isic_section_name", "industry_name"))

2.4 Interactive Scatter Plots

For simplicity in this assignment, we will use master6, even though there may be less observations when the data tables are merged.

In this section, we will use crosstalk and d3scatter packages to create interactive scatter plots. They are filtered by country, region, year, income level, industry section, industry, skill category and skill. By default, all options are chosen and multiple options can selected for the filters. They are also coloured by year to show the differences over the years. The limitations are the inability to adjust the orientation of the axes labels and to place the reference lines at zero for aesthetic purpose. Nonetheless, the interactive scatter plots are more for exploratory purpose for this assignment, so we won’t adjust the aesthetics of the scatter plots further. In addition, due to the long processing time of the interactive scatter plot and resulting large size of the R Markdown file, we will only run the code of one of the interactive scatter plot (Employment growth vs Industry migration). The remaining plots will be illustrated via image format.

The following scatter plots are plotted:

shared_master <- SharedData$new(master6)
bscols(widths = c(3,NA),
       list(
         filter_select("country_name", "Country",
                       shared_master, ~country_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("year", "Year",
                       shared_master, ~year,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_region", "Region",
                       shared_master, ~wb_region,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_income", "Income Level",
                       shared_master, ~wb_income,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("isic_section_name", "Industry Section",
                       shared_master, ~isic_section_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("industry_name", "Industry",
                       shared_master, ~industry_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_category", "Skill Category",
                       shared_master, ~skill_group_category,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_name", "Skill",
                       shared_master, ~skill_group_name,
                       allLevels = TRUE, multiple = TRUE)
       ),
       d3scatter(shared_master, ~employment_growth, ~GDP_per_capita_growth, ~year,
                 x_label = "Employment Growth",
                 y_label = "GDP Per Capita Growth",
                 width = "100%", height = 500)
)
GDP per capita vs Employment Growth
bscols(widths = c(3,NA),
       list(
         filter_select("country_name", "Country",
                       shared_master, ~country_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("year", "Year",
                       shared_master, ~year,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_region", "Region",
                       shared_master, ~wb_region,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_income", "Income Level",
                       shared_master, ~wb_income,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("isic_section_name", "Industry Section",
                       shared_master, ~isic_section_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("industry_name", "Industry",
                       shared_master, ~industry_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_category", "Skill Category",
                       shared_master, ~skill_group_category,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_name", "Skill",
                       shared_master, ~skill_group_name,
                       allLevels = TRUE, multiple = TRUE)
       ),
       d3scatter(shared_master, ~industry_migration, ~GDP_per_capita_growth, ~year,
                 x_label = "Industry Migration (Net Per 10)",
                 y_label = "GDP Per Capita Growth",
                 width = "100%", height = 500)
)
GDP per capita vs Industry migration
bscols(widths = c(3,NA),
       list(
         filter_select("country_name", "Country",
                       shared_master, ~country_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("year", "Year",
                       shared_master, ~year,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_region", "Region",
                       shared_master, ~wb_region,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_income", "Income Level",
                       shared_master, ~wb_income,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("isic_section_name", "Industry Section",
                       shared_master, ~isic_section_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("industry_name", "Industry",
                       shared_master, ~industry_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_category", "Skill Category",
                       shared_master, ~skill_group_category,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_name", "Skill",
                       shared_master, ~skill_group_name,
                       allLevels = TRUE, multiple = TRUE)
       ),
       d3scatter(shared_master, ~skill_migration, ~GDP_per_capita_growth, ~year,
                 x_label = "Skill Migration (Net Per 10)",
                 y_label = "GDP Per Capita Growth",
                 width = "100%", height = 500)
)
GDP per capita vs Skill migration
bscols(widths = c(3,NA),
       list(
         filter_select("country_name", "Country",
                       shared_master, ~country_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("year", "Year",
                       shared_master, ~year,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_region", "Region",
                       shared_master, ~wb_region,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_income", "Income Level",
                       shared_master, ~wb_income,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("isic_section_name", "Industry Section",
                       shared_master, ~isic_section_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("industry_name", "Industry",
                       shared_master, ~industry_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_category", "Skill Category",
                       shared_master, ~skill_group_category,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_name", "Skill",
                       shared_master, ~skill_group_name,
                       allLevels = TRUE, multiple = TRUE)
       ),
       d3scatter(shared_master, ~industry_migration, ~employment_growth, ~year,
                 x_label = "Industry Migration (Net Per 10)",
                 y_label = "Employment Growth",
                 width = "100%", height = 500)
)
bscols(widths = c(3,NA),
       list(
         filter_select("country_name", "Country",
                       shared_master, ~country_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("year", "Year",
                       shared_master, ~year,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_region", "Region",
                       shared_master, ~wb_region,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("wb_income", "Income Level",
                       shared_master, ~wb_income,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("isic_section_name", "Industry Section",
                       shared_master, ~isic_section_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("industry_name", "Industry",
                       shared_master, ~industry_name,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_category", "Skill Category",
                       shared_master, ~skill_group_category,
                       allLevels = TRUE, multiple = TRUE),
         filter_select("skill_group_name", "Skill",
                       shared_master, ~skill_group_name,
                       allLevels = TRUE, multiple = TRUE)
       ),
       d3scatter(shared_master, ~skill_migration, ~employment_growth, ~year,
                 x_label = "Skill Migration (Net Per 10)",
                 y_label = "Employment Growth",
                 width = "100%", height = 500)
)
Employment growth vs Skill migration

In this assignment, we will focus on 2019 dataset for information and communication industry or tech skills, given the relevance to SMU MITB students. By applying the filters i.e. year (2019), industry section (information and communication) and skill category (tech skills), the above interactive scatter plots will produce Figures 4-8.

Figure 4: GDP per capita vs Employment Growth
Figure 5: GDP per capita vs Industry migration
Figure 6: GDP per capita vs Skill migration
Figure 7: Employment growth vs Industry migration
Figure 8: Employment growth vs Skill migration

2.5 Filtered Datasets

Next, we will edit the code to filter master6. Two filtered data tables are created, one filtered by year 2019 and information and communication industry and the other filtered by year 2019 and tech skills. These will be used to find specific results under the regression and correlation plots. With the results, countries can find out how GDP per capita growth is impacted by changes in employment and migration for information and communication industry and tech skill. Users can also find out the relationship between employment growth and migration for information and communication industry and tech skill to find potential job opportunities.

filtered_master1 <- master6 %>%
  filter(isic_section_name == "Information and communication") %>%
  filter(year == 2019)

filtered_master2 <- master6 %>%
  filter(skill_group_category == "Tech Skills") %>%
  filter(year == 2019)

2.6 Regression Plots

Using the filtered data tables, we will create the respective regression plots as seen below. ggstatsplot package is used instead of ggplot package, as it allows histograms to be plotted (marginal distributions) and we can analyse the distributions of both variables in the regression plots.

For the regression plots, the title, axes (xlab, ylab) and source (caption) are labelled appropriately. The y axis label is rotated to improve readability. The plots are coloured by year. Note that since the filtered data tables contain data only from year 2019, there will only be one colour seen here. Reference lines are drawn at zero to allow users to distinguish between positive and negative easily. The default 95% confidence interval is used and the results of the statistical tests are displayed. From the plots, we note that the distributions of all the variables follow a normal distribution. We will further discuss the results in the later section.

In addition, we will use the olsrr package to show the results of each regression plot, as it’s easier to understand and interpret.

ggscatterstats(filtered_master1, employment_growth, GDP_per_capita_growth,
               title = "Relationship Between GDP Per Capita Growth & Employment Growth",
               caption = "Source: LinkedIn & World Bank Group",
               ggplot.component = list(ggplot2::
                        aes(employment_growth, GDP_per_capita_growth,
                            colour = year),
                        xlab("Employment Growth"),
                        ylab("GDP Per\nCapita\nGrowth"),
                        theme(axis.title.y = element_text(angle = 0)),
                        geom_hline(yintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1),
                        geom_vline(xintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1)))

ols_regress(GDP_per_capita_growth ~ employment_growth, filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.067       RMSE                   0.052 
R-Squared               0.004       Coef. Var          -1103.029 
Adj. R-Squared          0.004       MSE                    0.003 
Pred R-Squared          0.004       MAE                    0.036 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.075           1          0.075    27.976    0.0000 
Residual       16.824        6267          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                     Parameter Estimates                                       
----------------------------------------------------------------------------------------------
            model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------------
      (Intercept)    -0.004         0.001                 -6.351    0.000    -0.005    -0.003 
employment_growth    -0.144         0.027       -0.067    -5.289    0.000    -0.197    -0.091 
----------------------------------------------------------------------------------------------
ggscatterstats(filtered_master1, industry_migration, GDP_per_capita_growth,
               title = "Relationship Between GDP Per Capita Growth & Industry Migration",
               caption = "Source: LinkedIn & World Bank Group",
               ggplot.component = list(ggplot2::
                        aes(industry_migration, GDP_per_capita_growth,
                            colour = year),
                        xlab("Industry Migration (Net Per 10)"),
                        ylab("GDP Per\nCapita\nGrowth"),
                        theme(axis.title.y = element_text(angle = 0)),
                        geom_hline(yintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1),
                        geom_vline(xintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1)))

ols_regress(GDP_per_capita_growth ~ industry_migration, filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.108       RMSE                   0.052 
R-Squared               0.012       Coef. Var          -1099.075 
Adj. R-Squared          0.011       MSE                    0.003 
Pred R-Squared          0.011       MAE                    0.035 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.196           1          0.196    73.357    0.0000 
Residual       16.704        6267          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                      Parameter Estimates                                       
-----------------------------------------------------------------------------------------------
             model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
-----------------------------------------------------------------------------------------------
       (Intercept)    -0.004         0.001                 -6.049    0.000    -0.005    -0.003 
industry_migration    -0.030         0.003       -0.108    -8.565    0.000    -0.036    -0.023 
-----------------------------------------------------------------------------------------------
ggscatterstats(filtered_master2, skill_migration, GDP_per_capita_growth,
               title = "Relationship Between GDP Per Capita Growth & Skill Migration",
               caption = "Source: LinkedIn & World Bank Group",
               ggplot.component = list(ggplot2::
                        aes(skill_migration, GDP_per_capita_growth,
                            colour = year),
                        xlab("Skill Migration (Net Per 10)"),
                        ylab("GDP Per\nCapita\nGrowth"),
                        theme(axis.title.y = element_text(angle = 0)),
                        geom_hline(yintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1),
                        geom_vline(xintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1)))

ols_regress(GDP_per_capita_growth ~ skill_migration, filtered_master2)
                         Model Summary                           
----------------------------------------------------------------
R                       0.129       RMSE                  0.050 
R-Squared               0.017       Coef. Var          -810.651 
Adj. R-Squared          0.017       MSE                   0.003 
Pred R-Squared          0.016       MAE                   0.034 
----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.237           1          0.237    93.927    0.0000 
Residual       13.971        5537          0.003                     
Total          14.208        5538                                    
---------------------------------------------------------------------

                                    Parameter Estimates                                      
--------------------------------------------------------------------------------------------
          model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
--------------------------------------------------------------------------------------------
    (Intercept)    -0.005         0.001                 -7.341    0.000    -0.006    -0.004 
skill_migration    -0.037         0.004       -0.129    -9.692    0.000    -0.044    -0.029 
--------------------------------------------------------------------------------------------
ggscatterstats(filtered_master1, industry_migration, employment_growth,
               title = "Relationship Between Employment Growth & Industry Migration",
               caption = "Source: LinkedIn & World Bank Group",
               ggplot.component = list(ggplot2::
                        aes(industry_migration, employment_growth,
                            colour = year),
                        xlab("Industry Migration (Net Per 10)"),
                        ylab("Employment\nGrowth"),
                        theme(axis.title.y = element_text(angle = 0)),
                        geom_hline(yintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1),
                        geom_vline(xintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1)))

ols_regress(employment_growth ~ industry_migration, filtered_master1)
                         Model Summary                          
---------------------------------------------------------------
R                       0.655       RMSE                 0.019 
R-Squared               0.429       Coef. Var          733.385 
Adj. R-Squared          0.428       MSE                  0.000 
Pred R-Squared          0.428       MAE                  0.014 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                 ANOVA                                  
-----------------------------------------------------------------------
               Sum of                                                  
              Squares          DF    Mean Square       F          Sig. 
-----------------------------------------------------------------------
Regression      1.680           1          1.680    4803.486    0.0000 
Residual        2.240        6406          0.000                       
Total           3.920        6407                                      
-----------------------------------------------------------------------

                                    Parameter Estimates                                      
--------------------------------------------------------------------------------------------
             model     Beta    Std. Error    Std. Beta      t        Sig     lower    upper 
--------------------------------------------------------------------------------------------
       (Intercept)    0.002         0.000                  7.822    0.000    0.001    0.002 
industry_migration    0.074         0.001        0.655    69.307    0.000    0.072    0.076 
--------------------------------------------------------------------------------------------
ggscatterstats(filtered_master2, skill_migration, employment_growth,
               title = "Relationship Between Employment Growth & Skill Migration",
               caption = "Source: LinkedIn & World Bank Group",
               ggplot.component = list(ggplot2::
                        aes(skill_migration, employment_growth,
                            colour = year),
                        xlab("Skill Migration (Net Per 10)"),
                        ylab("Employment\nGrowth"),
                        theme(axis.title.y = element_text(angle = 0)),
                        geom_hline(yintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1),
                        geom_vline(xintercept = 0,
                                   linetype = "dashed",
                                   color = "grey60",
                                   size = 1)))

ols_regress(employment_growth ~ skill_migration, filtered_master2)
                         Model Summary                          
---------------------------------------------------------------
R                       0.498       RMSE                 0.019 
R-Squared               0.248       Coef. Var          632.051 
Adj. R-Squared          0.248       MSE                  0.000 
Pred R-Squared          0.248       MAE                  0.013 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                 ANOVA                                  
-----------------------------------------------------------------------
               Sum of                                                  
              Squares          DF    Mean Square       F          Sig. 
-----------------------------------------------------------------------
Regression      0.647           1          0.647    1866.413    0.0000 
Residual        1.958        5648          0.000                       
Total           2.604        5649                                      
-----------------------------------------------------------------------

                                   Parameter Estimates                                    
-----------------------------------------------------------------------------------------
          model     Beta    Std. Error    Std. Beta      t        Sig     lower    upper 
-----------------------------------------------------------------------------------------
    (Intercept)    0.002         0.000                  8.863    0.000    0.002    0.003 
skill_migration    0.048         0.001        0.498    43.202    0.000    0.046    0.050 
-----------------------------------------------------------------------------------------

2.7 Correlation Plots

Next, we will find out the correlation of the variables. Given that there are very few variables (only 4), we will use ggcorrmat (under ggstatsplot package) instead of corrplot and other packages, as it will produce a simple correlation plot. We will use the default properties i.e. upper triangular matrix and significance level of 0.05. Both filtered data tables are used for the correlation plots.

It is noted that GDP per capita growth is weakly correlated with employment growth, industry and skill migration. Industry and skill migration are highly correlated (>0.8) for both plots. The results will be further discussed in the later section.

ggcorrmat(filtered_master1,
          cor.vars = c(7, 10, 11, 12))

ggcorrmat(filtered_master2,
          cor.vars = c(7, 10, 11, 12))

2.8 Multiple Regression

In addition to the simple regression plots, we will build a multiple regression model in this assignment using the olsrr package to study the relationship between GDP per capita growth and multiple variables i.e. employment growth, industry and skill migration. 3 multiple regression models are created, one with employment growth, industry and skill migration as dependent variables and we will remove industry migration or skill migration for the other two, as they are highly correlated, as seen in the earlier correlation plots.

All the regression models show weak relationship (low R-squared values) between GDP per capita growth and the dependent variables. We will further discuss the results in the later section.

ols_regress(GDP_per_capita_growth ~ employment_growth + industry_migration
            + skill_migration,
            filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.118       RMSE                   0.052 
R-Squared               0.014       Coef. Var          -1097.910 
Adj. R-Squared          0.014       MSE                    0.003 
Pred R-Squared          0.013       MAE                    0.035 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.236           3          0.079    29.608    0.0000 
Residual       16.663        6265          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                      Parameter Estimates                                       
-----------------------------------------------------------------------------------------------
             model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
-----------------------------------------------------------------------------------------------
       (Intercept)    -0.004         0.001                 -6.108    0.000    -0.005    -0.003 
 employment_growth     0.000         0.035        0.000     0.013    0.990    -0.068     0.069 
industry_migration    -0.012         0.006       -0.043    -1.844    0.065    -0.024     0.001 
   skill_migration    -0.022         0.006       -0.081    -3.911    0.000    -0.033    -0.011 
-----------------------------------------------------------------------------------------------
ols_regress(GDP_per_capita_growth ~ employment_growth + industry_migration,
            filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.108       RMSE                   0.052 
R-Squared               0.012       Coef. Var          -1099.162 
Adj. R-Squared          0.011       MSE                    0.003 
Pred R-Squared          0.011       MAE                    0.035 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.196           2          0.098    36.678    0.0000 
Residual       16.704        6266          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                      Parameter Estimates                                       
-----------------------------------------------------------------------------------------------
             model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
-----------------------------------------------------------------------------------------------
       (Intercept)    -0.004         0.001                 -6.038    0.000    -0.005    -0.003 
 employment_growth     0.004         0.035        0.002     0.105    0.917    -0.065     0.072 
industry_migration    -0.030         0.004       -0.109    -6.722    0.000    -0.039    -0.021 
-----------------------------------------------------------------------------------------------
ols_regress(GDP_per_capita_growth ~ employment_growth + skill_migration,
            filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.116       RMSE                   0.052 
R-Squared               0.013       Coef. Var          -1098.120 
Adj. R-Squared          0.013       MSE                    0.003 
Pred R-Squared          0.012       MAE                    0.035 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.227           2          0.114    42.695    0.0000 
Residual       16.672        6266          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                     Parameter Estimates                                       
----------------------------------------------------------------------------------------------
            model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------------
      (Intercept)    -0.004         0.001                 -6.197    0.000    -0.005    -0.003 
employment_growth    -0.029         0.031       -0.013    -0.922    0.356    -0.090     0.032 
  skill_migration    -0.030         0.004       -0.109    -7.561    0.000    -0.037    -0.022 
----------------------------------------------------------------------------------------------

2.9 Stepwise Regression

Finally, we also carry out stepwise regression to remove variable(s) that is not important to the regression model, based on the p value. The olsrr package (ols_step_backward_aic) is used to carry out backward stepwise regression.

ols_step_backward_aic(lm(GDP_per_capita_growth ~ employment_growth + industry_migration
                         + skill_migration, filtered_master1))

                        Backward Elimination Summary                         
---------------------------------------------------------------------------
Variable                AIC         RSS      Sum Sq     R-Sq      Adj. R-Sq 
---------------------------------------------------------------------------
Full Model           -19375.641    16.663     0.236    0.01398      0.01351 
employment_growth    -19377.640    16.663     0.236    0.01398      0.01366 
---------------------------------------------------------------------------

As seen in the backward stepwise regression result, employment growth is removed. We will build a regression model without employment growth to explore the results. However, the regression model still shows a weak relationship (low R-squared values) between GDP per capita growth and the dependent variables. We will further discuss the results in the later section.

ols_regress(GDP_per_capita_growth ~ industry_migration + skill_migration, filtered_master1)
                          Model Summary                           
-----------------------------------------------------------------
R                       0.118       RMSE                   0.052 
R-Squared               0.014       Coef. Var          -1097.822 
Adj. R-Squared          0.014       MSE                    0.003 
Pred R-Squared          0.013       MAE                    0.035 
-----------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                                ANOVA                                 
---------------------------------------------------------------------
               Sum of                                                
              Squares          DF    Mean Square      F         Sig. 
---------------------------------------------------------------------
Regression      0.236           2          0.118    44.419    0.0000 
Residual       16.663        6266          0.003                     
Total          16.899        6268                                    
---------------------------------------------------------------------

                                      Parameter Estimates                                       
-----------------------------------------------------------------------------------------------
             model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
-----------------------------------------------------------------------------------------------
       (Intercept)    -0.004         0.001                 -6.126    0.000    -0.005    -0.003 
industry_migration    -0.012         0.006       -0.043    -2.062    0.039    -0.023    -0.001 
   skill_migration    -0.022         0.006       -0.081    -3.913    0.000    -0.033    -0.011 
-----------------------------------------------------------------------------------------------

3.0 Final Data Visualisation

Subsequently, we will combine the interactive filters, regression plot and correlation plot on R Shiny app to produce an interactive, user-friendly dashboard. Figure 9 shows the final dashboard or data visualisation.

Figure 9: Final Data Visualisation

3.1 Discussion of Results

Finally, we discuss the results for the filtered dataset, which is for year 2019, information and communication industry and tech skills.